nearest neighbor classification
We thank all 3 reviewers for their thoughtful comments
We thank all 3 reviewers for their thoughtful comments. " nearest neighbor theory papers have largely not worried too much about constants......This analysis is " In the evolution of the study of nearest neighbor, early work focused on consistency, and later Y ou are absolutely correct that very few work studies the constant. We argue that this is "a feature, not " The scope of the analysis is very limited to distributed nearest neighbor classification (along with some distributional The latter is a fairly interesting direction, due to its connection with deep learning. " Currently the paper has lots of small typos. Please proofread carefully and revise.. " Thanks for pointing out, and we " Also, I find T able 1 ... How is the risk percentage defined in comparison to the oracle KNN/OWNN? " I'd suggest adding error bars to T able 1 (for example, to denote standard deviations across experimental repeats).
Statistical Guarantees of Distributed Nearest Neighbor Classification
Nearest neighbor is a popular nonparametric method for classification and regression with many appealing properties. In the big data era, the sheer volume and spatial/temporal disparity of big data may prohibit centrally processing and storing the data. This has imposed considerable hurdle for nearest neighbor predictions since the entire training data must be memorized. One effective way to overcome this issue is the distributed learning framework. Through majority voting, the distributed nearest neighbor classifier achieves the same rate of convergence as its oracle version in terms of the regret, up to a multiplicative constant that depends solely on the data dimension. The multiplicative difference can be eliminated by replacing majority voting with the weighted voting scheme. In addition, we provide sharp theoretical upper bounds of the number of subsamples in order for the distributed nearest neighbor classifier to reach the optimal convergence rate. It is interesting to note that the weighted voting scheme allows a larger number of subsamples than the majority voting one. Our findings are supported by numerical studies.
We thank all 3 reviewers for their thoughtful comments
We thank all 3 reviewers for their thoughtful comments. " nearest neighbor theory papers have largely not worried too much about constants......This analysis is " In the evolution of the study of nearest neighbor, early work focused on consistency, and later Y ou are absolutely correct that very few work studies the constant. We argue that this is "a feature, not " The scope of the analysis is very limited to distributed nearest neighbor classification (along with some distributional The latter is a fairly interesting direction, due to its connection with deep learning. " Currently the paper has lots of small typos. Please proofread carefully and revise.. " Thanks for pointing out, and we " Also, I find T able 1 ... How is the risk percentage defined in comparison to the oracle KNN/OWNN? " I'd suggest adding error bars to T able 1 (for example, to denote standard deviations across experimental repeats).
Generative Local Metric Learning for Nearest Neighbor Classification
We consider the problem of learning a local metric to enhance the performance of nearest neighbor classification. Conventional metric learning methods attempt to separate data distributions in a purely discriminative manner; here we show how to take advantage of information from parametric generative models. We focus on the bias in the information-theoretic error arising from finite sampling effects, and find an appropriate local metric that maximally reduces the bias based upon knowledge from generative models. As a byproduct, the asymptotic theoretical analysis in this work relates metric learning with dimensionality reduction, which was not understood from previous discriminative approaches. Empirical experiments show that this learned local metric enhances the discriminative nearest neighbor performance on various datasets using simple class conditional generative models.
Rates of Convergence for Nearest Neighbor Classification
We analyze the behavior of nearest neighbor classification in metric spaces and provide finite-sample, distribution-dependent rates of convergence under minimal assumptions. These are more general than existing bounds, and enable us, as a by-product, to establish the universal consistency of nearest neighbor in a broader range of data spaces than was previously known. We illustrate our upper and lower bounds by introducing a new smoothness class customized for nearest neighbor classification. We find, for instance, that under the Tsybakov margin condition the convergence rate of nearest neighbor matches recently established lower bounds for nonparametric classification.
Rates of Convergence for Nearest Neighbor Classification
Kamalika Chaudhuri, Sanjoy Dasgupta
We analyze the behavior of nearest neighbor classification in metric spaces and provide finite-sample, distribution-dependent rates of convergence under minimal assumptions. These are more general than existing bounds, and enable us, as a by-product, to establish the universal consistency of nearest neighbor in a broader range of data spaces than was previously known. We illustrate our upper and lower bounds by introducing a new smoothness class customized for nearest neighbor classification. We find, for instance, that under the Tsybakov margin condition the convergence rate of nearest neighbor matches recently established lower bounds for nonparametric classification.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > Texas (0.04)
- North America > United States > Hawaii (0.04)
Predicting Useful Neighborhoods for Lazy Local Learning
Lazy local learning methods train a classifier "on the fly" at test time, using only a subset of the training instances that are most relevant to the novel test example. The goal is to tailor the classifier to the properties of the data surrounding the test example. Existing methods assume that the instances most useful for building the local model are strictly those closest to the test example. However, this fails to account for the fact that the success of the resulting classifier depends on the full distribution of selected training instances. Rather than simply gathering the test example's nearest neighbors, we propose to predict the subset of training data that is jointly relevant to training its local model. We develop an approach to discover patterns between queries and their "good" neighborhoods using large-scale multilabel classification with compressed sensing. Given a novel test point, we estimate both the composition and size of the training subset likely to yield an accurate local model. We demonstrate the approach on image classification tasks on SUN and aPascal and show its advantages over traditional global and local approaches.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)
Review for NeurIPS paper: Statistical Guarantees of Distributed Nearest Neighbor Classification
Weaknesses: Unfortunately, I strongly believe that this paper will have a very limited attraction from the research community since derivations are done for binary classification and the nearest neighbor classification is no longer popular as before as there are numerous good alternatives. To validate my claim, I looked at the recent Neurips 2019 paper cited as [45] which is quite similar to this paper. In one year, it is cited only once. This is quite natural in my opinion since deep neural networks dominated classification and there are good alternatives to the nearest neighbor classification for large-scale data as hashing, approximate nearest neighbor classification methods, etc. Especially, unsupervised and supervised hashing methods are quite popular for large-scale data with high-dimensional feature spaces. Therefore, I strongly believe that the impact of the paper is very limited and it will attract a very few attention from research community.
Review for NeurIPS paper: Statistical Guarantees of Distributed Nearest Neighbor Classification
The paper provides some interesting insights into distributed weighted nearest neighbors methods with some nice theoretical implications. I believe the results would be of interest to the nonparametric statistics community. If accepted, additional experimental results against widely used scalable nearest neighbors approaches would greatly improve the practical impact of the work -- however this suggestion is optional and at the discretion of the authors.